AITopics | Monterey

Collaborating Authors

Monterey

Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

Yadav, Akash, Adebiyi, Taiwo A., Zhang, Ruda

arXiv.org Machine LearningApr-22-2026

Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on two scientific foundation models for weather and timeseries forecasting along with an additional regression task. Across benchmarks against uncertainty-aware baselines, we find that Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable coverage, while requiring only minutes of post-hoc tuning versus days of retraining for competitive baselines.

artificial intelligence, calibration, machine learning, (18 more...)

arXiv.org Machine Learning

2604.1953

Country:

North America > United States > Texas > Harris County > Houston (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Forethought_and_Hindsight_in_Credit_Assignment__Camera_Ready_ (3).pdf

Neural Information Processing SystemsMar-14-2026, 06:58:46 GMT

backward model, international conference, learning, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RGMDT: Return-Gap-MinimizingDecisionTree ExtractioninNon-EuclideanMetricSpace

Neural Information Processing SystemsFeb-19-2026, 04:31:38 GMT

In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Finland > Northern Savo > Kuopio (0.04)

Genre: Research Report (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

DAT: Improving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain Fengpeng Li1 Kemou Li1 Haiwei Wu

Neural Information Processing SystemsFeb-18-2026, 12:14:48 GMT

Recent studies show that adversarial attacks disproportionately impact the patterns within the phase of the sample's frequency spectrum--typically containing

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Macao (0.04)
(25 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (0.36)
Government > Military (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Data Science (0.68)

Add feedback

Verified Safe Reinforcement Learning for Neural Network Dynamic Models

Neural Information Processing SystemsFeb-18-2026, 07:21:00 GMT

Learning reliably safe autonomous control is one of the core problems in trustworthy autonomy.

controller, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Parallelizing Linear Transformers with the Delta Rule over Sequence Length Songlin Y ang Bailin Wang Y u Zhang Yikang Shen Y oon Kim Massachusetts Institute of Technology Soochow University

Neural Information Processing SystemsFeb-18-2026, 06:03:05 GMT

Transformers with linear attention (i.e., linear transfor mers) and state-space models have recently been suggested as a viable linear-time alt ernative to transformers with softmax attention. However, these models still underp erform transformers especially on tasks that require in-context retrieval. Whil e more expressive variants of linear transformers which replace the additive upda te in linear transformers with the delta rule [DeltaNet; 101 ] have been found to be more effective at associative recall, existing algorithms for training such mode ls do not parallelize over sequence length and are thus inefficient to train on modern ha rdware. This work describes a hardware-efficient algorithm for training line ar transformers with the delta rule, which exploits a memory-efficient representati on for computing products of Householder matrices [ 11 ]. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B mode l for 100B tokens and find that it outperforms recent linear-time baselines su ch as Mamba [ 31 ] and GLA [ 124 ] in terms of perplexity and zero-shot performance on downst ream tasks. We also experiment with two hybrid models which combine Delt aNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrids outperform strong transf ormer baselines.

arxiv preprint, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Rwanda > Kigali > Kigali (0.04)
North America > United States > Maryland > Baltimore (0.04)
(19 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.67)
Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

c72861451d6fa9dfa64831102b9bb71a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 02:22:47 GMT

machine learning, natural language, trajectory, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > South Korea > Daegu > Daegu (0.04)
(16 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

e7681dd6fe16052433ab68cd1555bdc9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 17:13:46 GMT

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(14 more...)

Genre: Research Report > New Finding (0.45)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Speech (0.68)
Information Technology > Artificial Intelligence > Vision (0.68)
(2 more...)

Add feedback

Inflationary Flows: Calibrated Bayesian Inference with Diffusion-Based Models

Neural Information Processing SystemsFeb-17-2026, 11:35:57 GMT

Beyond estimating parameters of interest from data, one of the key goals of statistical inference is to properly quantify uncertainty in these estimates.

artificial intelligence, experiment, machine learning, (16 more...)

Neural Information Processing Systems

Country: